What’s this all about?

In this tutorial, we will create a Github repository which we will use to store and edit some simple R scripts. This tutorial is broken into 2 parts - “Using Github in Linux” and “Using Github in RStudio”.

Why use Github?

Whether you are a programming expert, or you are still learning the basics and use programming simply as a means to an end in your research, we are all effectively software developers to some degree. This is 2022! These days, pretty much all research in the field of atmospheric sciences is performed using software programming of some kind.

Keeping track of the code you write can be a hassle, especially if you spend a lot of time exploring the “what if’s” of scientific analysis through your code. You may have 15 R-scripts each containing thousands of lines of code, but what if you want to perform ten sensitivity analyses, each with only a small modification to a few lines of code, within only one or two of these scripts? You may be tempted to take your original code, copy it ten times, modify each copy slightly, and call it a day. But, what happens if you find a bug in your original code? You would then need to go back and fix this bug in all ten of your new copies of your analysis! Yuck.

Github is extremely useful in this situation. By hosting your code on Github, you can maintain a single copy of your “main” or “baseline” code, and create “branches” of this code with modifications made to carry out your sensitivity analysis “side quests”. Using Github, you can easily make changes to your “baseline” code and apply these to your branches. Github also keeps track of your version changes, and allows you to rewind back to earlier versions if you need to revisit old code.

Getting started

First, you must have an account on http://github.com. Creating an account is easy. Simply follow the link to the webpage, click “Sign up” in the top right corner, and follow the instructions to create your profile.

Once you have an account, go to your profile, click “Repositories” in the top bar, and then click the green “New” button to create a new repository:

Now, enter your repository name. You should leave the “Add a README file” and “Add .gitignore” options unchecked. We will create these ourselves. Click “Create repository” to create your repository!

The next page should look something like this:

Stay on this page, but don’t click anything yet. We’ll come back to it shortly.

Using Github in Linux

Here are instructions to create and work with a repository in the linux environment. To learn how to work with a repository in RStudio, skip to the “Using Github in RStudio” section of this document.

In this tutorial, we are going to pretend that our beloved advisor, John, has asked us to perform a very important analysis in R. Specifically, John wants us to print some text to the R console. He also wants us to add and subtract some numbers together. It is very important to John that we host this code on Github so that he can understand the nuances of this complicated analysis.

Are we up to the task? I think so.

The code

To begin, open your Terminal application and navigate to the location where you would like to store and modify your project code. I decided that I’d like to create a directory within my Documents/LAIR group/ folder on my local laptop, so I will navigate there (yours will probably look different):

cd Documents/LAIR\ Group/

We are going to pretend that we have written some fresh code to accomplish these tasks. Download the provided zip file to this directory and unzip the file to create a subdirectory which will contain this analysis.

unzip github_tutorial.zip

Navigate yourself into this directory and examine the contents

cd github_tutorial
ls

It should look like this:

There are 3 important R scripts that have been created to carry out John’s analysis, along with some extra support files and a README file. Let’s take a look at each of these:

cat hello_world.r
# hello_world.r
# Description: print "Hello World!" to the console

message("Hello World!")

This script accomplished one key goal of our analysis: to print some text to the console. Woo hoo! John will be thrilled when he sees this.

cat addition.r
# addition.r
# Description: perform some basic addition and print to the console

a <- 2
b <- 5

a_plus_b <- a + b
message(paste0("a = ", a))
message(paste0("b = ", b))
message(paste0("a + b = ", a_plus_b))

This code does some addition and prints the output to the console. Wow! John will be so impressed.

cat subtraction.r
# subtraction.r
# Description: perform some basic subtraction and print to the console

a <- 4
b <- 5

a_minus_b <- a - b
message(paste0("a = ", a))
message(paste0("b = ", b))
message(paste0("a - b = ", a_minus_b))

Here we have some code which performs subtraction. This is great! Exactly what John wanted. He’s going to be so proud.

There is also a README file in here. Let’s see what it says.

cat README
This is a README file.
It contains everything you need to know about this project.

In this project, we print "Hello World!" to the console.
We also perform some simple addition and subtraction.

How about these other files in the “useful_files/” and “useless_files/” directories? Let’s look:

ls useful_files
file_1.txt file_2.txt file_3.txt file_4.txt

What’s in these files? They all have the same contents - some “important” text.

cat useful_files/file_1.txt
important information!

Similar for the “useless_files”:

ls useless_files
file_a.txt file_b.txt file_c.txt file_d.txt

Let’s see what’s in a file

cat useless_files/file_a.txt
unimportant information!

The first commit

OK, great. Now that we know about our directory contents, let’s upload this to Github so John can keep track of it. Within your github_tutorial/ directory, follow these steps to initialize your Github repository.

Start by initializing the repository:

git init

This command creates a new branch of your project code, called “master”. If you’re like me, and you feel like there could be a more tasteful name for your branch than “master”, you can rename it to “main”:

git branch -m master main

The “git init” command also creates a “hidden” folder which you can see when you include the “-a” option of the “ls” command:

ls -a

The contents of this folder contain a lot of stuff that happens behind the hood of Github version control (you don’t need to worry about this, but it’s neat to take a look):

Now, before “staging” our code to be added to the remote Github repository, we should be aware that there are some “useless” files in this directory which aren’t important for sharing this code with others. We can explicitly tell Github which files to ignore by creating a “gitignore” file. Do this in any text editor that you like (I’ll use vim).

vim .gitignore

Now list the files you want to ignore:

# ignore these files:
useless_files/**

This tells git to ignore any files in the “useless_files/” directory. Nice!

Now, we can “stage” the files in this directory with the “git add” command

git add .

Then, we “commit” these changes and give them a description. The “-m” flag allows us to add a message description to this upload. Since this is the first time we’re adding files to the remote repository, let’s call this “first commit”.

git commit -m "first commit"

You should get some output that looks like this:

To finish off our first commit, we need to go back to our github repository webpage and copy the repository url that it has given us:

Once the url is copied to the clipboard, you can link the local repository you’ve initialized to the online repository that you made:

git remote add origin [url that you copied]

Now, verify that the remote url is connected to this directory:

git remote -v

Finally, push the code to the online (remote) repository. By default, Github names your main branch “master”:

git push -u origin master

You should get some output that looks like this:

Congratulations! You just made your first commit. Note that the “git init” and “git remote add origin [url]” commands only need to be executed once during your first commit. The “git add”, “git commit”, and “git push” commands are the three commands that you will use for every future commit.

That’s better. Now, go to your online Github repository page and view the contents of your repository. They should be updated to contain the files you just pushed.

You should get some output that looks like this:

Notice how the “useless files” didn’t get added to the repository. Nice!

Pushing updates

John is ecstatic about these updates. “I’m impressed,” says John. “Now, I would love it if we can change”b” to the number 6 in ‘addition.r’, and “a” to the number 8 in ‘subtraction.r’“. Alright - looks like we’ve got some editing to do.

From your main directory, use a text editor to edit the “addition.r” script:

b <- 6

Do the same thing for the “subtraction.r” script:

a <- 8

Now, stage these changes, commit them with a new message, and push to the remote repository:

git add .
git commit -m "changed variables in addition.r and subtraction.r"
git push origin main

You should get output that looks like this:

Now, look at your github repo webpage and ensure these changes were pushed:

You can see the details of this commit by clicking on the text “changed variables in addition.r and subtraction.r”

Creating a new branch

John takes a look at your update. “This is great progress,” he says. “Let’s be sure to save this version of the code. Next, let’s explore another case of this problem where we save the output of the addition script, and then perform some multiplication on it.”

OK. John liked the first analysis, so we want to save it as our “Baseline” case. But, we want to explore this “what if” scenario and include some additional features without modifying any of the original baseline code. For this, we want to create a “branch” - essentially, a new copy of the original code which we can modify and re-upload to Github without altering the original version.

The first step to creating a new branch is to give it a unique name. Let’s call it the “multiplication_case”:

git checkout -b multiplication_case

The “checkout” command allows you to switch from one branch to another. The “-b” option indicates the creation of a brand new branch.

We want to define our multiplication_case branch such that built from the original code - that is, if any changes to the main” branch are pushed to the repository, we want to easily be able to apply those changes to this new code. To do this, we will set the “main” branch as the “upstream” branch using this command:

git checkout -b multiplication_case

Now check that this new branch is set to pull updates from “origin/main”

git branch -vv

Your output should look something like this: